120 research outputs found
CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control
Intrinsic motivation is a promising exploration technique for solving
reinforcement learning tasks with sparse or absent extrinsic rewards. There
exist two technical challenges in implementing intrinsic motivation: 1) how to
design a proper intrinsic objective to facilitate efficient exploration; and 2)
how to combine the intrinsic objective with the extrinsic objective to help
find better solutions. In the current literature, the intrinsic objectives are
all designed in a task-agnostic manner and combined with the extrinsic
objective via simple addition (or used by itself for reward-free pre-training).
In this work, we show that these designs would fail in typical sparse-reward
continuous control tasks. To address the problem, we propose Constrained
Intrinsic Motivation (CIM) to leverage readily attainable task priors to
construct a constrained intrinsic objective, and at the same time, exploit the
Lagrangian method to adaptively balance the intrinsic and extrinsic objectives
via a simultaneous-maximization framework. We empirically show, on multiple
sparse-reward continuous control tasks, that our CIM approach achieves greatly
improved performance and sample efficiency over state-of-the-art methods.
Moreover, the key techniques of our CIM can also be plugged into existing
methods to boost their performances
Distilling Cognitive Backdoor Patterns within an Image
This paper proposes a simple method to distill and detect backdoor patterns
within an image: \emph{Cognitive Distillation} (CD). The idea is to extract the
"minimal essence" from an input image responsible for the model's prediction.
CD optimizes an input mask to extract a small pattern from the input image that
can lead to the same model output (i.e., logits or deep features). The
extracted pattern can help understand the cognitive mechanism of a model on
clean vs. backdoor images and is thus called a \emph{Cognitive Pattern} (CP).
Using CD and the distilled CPs, we uncover an interesting phenomenon of
backdoor attacks: despite the various forms and sizes of trigger patterns used
by different attacks, the CPs of backdoor samples are all surprisingly and
suspiciously small. One thus can leverage the learned mask to detect and remove
backdoor examples from poisoned training datasets. We conduct extensive
experiments to show that CD can robustly detect a wide range of advanced
backdoor attacks. We also show that CD can potentially be applied to help
detect potential biases from face datasets. Code is available at
\url{https://github.com/HanxunH/CognitiveDistillation}.Comment: ICLR202
- …